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fCj . Abstract 

^-H ■ 

("^^ ' Chemical reaction networks (CRNs) formally model chemistry in a well-mixed solution. 

^sj , CRNs are widely used to describe information processing occurring in natural cellular regulatory 

^ ' networks, and with upcoming advances in synthetic biology, CRNs are a promising programming 

Qh, language for the design of artificial molecular control circuitry. Due to a formal equivalence 

■^^ ' between CRNs and a model of distributed computing known as population protocols, results 

transfer readily between the two models. 

We show that if a CRN respects finite density (at most 0{n) additional molecules can 

be produced from n initial molecules), then starting from any dense initial configuration (all 

>«' ' molecular species initially present have initial count ^{n), where n is the initial molecular count 

\^ I and volume), then every producible species is produced in constant time with high probability. 

c/3 ' This implies that no CRN obeying the stated constraints can function as a timer, able to 

, ^, I produce a molecule, but doing so only after a time that is an unbounded function of the input 

size. This has consequences regarding an open question of Angluin, Aspnes, and Eisenstat 

concerning the ability of population protocols to perform fast, reliable leader election and to 

^ ' simulate arbitrary algorithms from a uniform initial state. 

Q ■ 1 Introduction 

Q I 1.1 Background of the field 

The engineering of complex artificial molecular systems will require a sophisticated understanding 
of how to program chemistry. A natural language for describing the interactions of molecular species 
in a well-mixed solution is that of (finite) chemical reaction networks (CRNs), i.e., finite sets of 
chemical reactions such asA + B^^A + C. When the behavior of individual molecules is modeled, 
C^ . CRNs are assigned semantics through stochastic chemical kinetics [14], in which reactions occur 

probabilistically with rate proportional to the product of the molecular count of their reactants 
and inversely proportional to the volume of the reaction vessel. The kinetic model of CRNs is 
based on the physical assumption of well-mixedness valid in a dilute solution. Thus, we assume 
the finite density constraint, which stipulates that a volume required to execute a CRN must be 
proportional to the maximum molecular count obtained during execution [20]. In other words, 
the total concentration (molecular count per volume) is bounded. This realistically constrains the 
speed of the computation achievable by CRNs. 
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Traditionally CRNs have been used as a descriptive language to analyze naturally occurring 
chemical reactions, as well as numerous other systems with a large number of interacting compo- 
nents such as gene regulatory networks and animal populations. However, recent investigations 
have viewed CRNs as a programming language for engineering artificial systems. These works 
have shown CRNs to have eclectic algorithmic abilities. Researchers have investigated the power 
of CRNs to simulate Boolean circuits [17], neural networks [15], and digital signal processing [16]. 
Other work has shown that bounded-space Turing machines can be simulated with an arbitrar- 
ily small, non-zero probability of error by a CRN with only a polynomial slowdown [3].^ Space- 
and energy-efficient simulation of space-bounded Turing machines can be done by CRNs imple- 
mentable by logically and thermodynamically reversible DNA strand displacement reactions, and 
as a consequence it is PSPACE-hard to predict whether a particular species is producible [22]. 
Even Turing-universal computation is possible with an arbitrarily small probability of error over all 
time [20]. The computational power of CRNs also provides insight on why it can be computation- 
ally difficult to simulate them [19], and why certain questions are frustratingly difficult to answer 
(e.g. undecidable) [12,23]. The programming approach to CRNs has also, in turn, resulted in novel 
insights regarding natural cellular regulatory networks [6]. 

1.2 Focus of this paper 

Informally, our main theorem shows that CRNs "respecting finite density" (meaning that the total 
molecular count obtainable is bounded by a constant times the initial molecular count) starting from 
any dense initial configuration (meaning that any species present initially has count i}(n), where 
n is the initial molecular count and volume) will create J7(n) copies of every producible species in 
constant time with high probability. We now explain the significance of this result in the context 
of an open question regarding the ability of stochastic CRNs to do fast, reliable computation. 

Many of the fundamental algorithmic abilities and limitations of CRNs were elucidated under 
the guise of a related model of distributed computing known as population protocols. Population 
protocols were introduced by Angluin, Aspnes, Diamadi, Fischer, and Peralta [1] as a model of 
resource-limited mobile sensors; see Aspnes and Ruppert [4] for an excellent survey of the model. 
A population protocol consists of a set of n agents with finite state set A (where we imagine 
n ^ |A|), together with a transition function 5 : A^ — >■ A^, with {ri,r2) = 5{qi,q2) indicating 
that if two agents in states qi and q2 interact, then they change to states r\ and r2, respectively. 
Multiple semantic models may be overlaid on this syntactic definition, but the randomized model 
studied by Angluin, Aspnes, and Eisenstat [3], in which the next two agents to interact are selected 
uniformly at random, coincides precisely with the model of stochastic chemical kinetics, so long as 
the CRN's reactions each have two reactants and two products,^ i.e., qi + q2 ^ fi + ^2- In fact, 
every population protocol, when interpreted as a CRN, respects finite density, because the total 
molecular count is constant over time. Therefore, although we state our results in the language of 
CRNs, the results apply a fortiori to population protocols. 

Angluin, Aspnes, and Eisenstat [3], and independently Soloveichik, Cook, Winfree, and Bruck [20] 
showed that arbitrary Turing machines may be simulated by randomized population protocols/CRNs 
with only a polynomial-time slowdown. Because a binary input x G {0, 1}* to a Turing machine 
must be specified in a CRN by a unary molecular count n € N, where n ~ 2l^l, this means that 



^This is surprising since finite CRNs necessarily must represent binary data strings in a unary encoding, since 
they lack positional information to tell the difference between two molecules of the same species. 

This turns out not to be a significant restriction on CRNs; see the introduction of [7] for a discussion of the issue. 



each construction is able to simulate each step of the Turing machine in time 0(polylog(n)), i.e. 
polynomial in |x|. Both constructions make essential use of the ability to specify arbitrary initial 
configurations, where a configuration is a vector c G N specifying the initial count c{S) of each 
species 5 € A. In particular, both constructions require a certain species to be present with initial 
count 1, a so-called "leader" (hence the title of the former paper). 

A test tube with a single copy of a certain type of molecule is experimentally difficult to prepare. 
The major open question of [3] asks whether this restriction may be removed: whether there is a 
CRN starting from a uniform initial configuration (i.e., with only a single species present, whose 
count is equal to the volume n) that can simulate a Turing machine. 

Angluin, Aspnes, and Eisenstat [3] observe that a leader may be elected from a uniform initial 
configuration of n copies of L by the reaction L + L^L + N; in other words, when two candidate 
leaders encounter each other, one drops out. However, this scheme has problems: 

1. The expected time until a single leader remains is 0(n), i.e., exponential in m, where m ~ 
logn is the number of bits required to represent the input n. In contrast, if a leader is 
assumed present from the start, the computation takes time 0(i(logn) • log n), where t{m) 
is the running time of the Turing machine on an m-h\t input, i.e., the time is polynomial in 
t. Therefore for polynomial-time computations, electing a leader in this manner incurs an 
exponential slowdown. 

2. There is no way for the leader to "know" when it has been elected. Therefore, even if a single 
leader is eventually elected, and even if the CRN works properly under the assumption of a 
single initial leader, prior to the conclusion of the election, multiple leaders will be competing 
and introducing unwanted behavior. 

These problems motivate the need for a timer CRN, a CRN that, from a uniform initial con- 
figuration of size n, is able to produce some species S, but the time before the first copy of S is 
produced is i(n), for some unbounded function t : N ^- N. If a CRN exists that can produce a copy 
of S after VL{n) time, then by adding the reaction S + L ^ ^active, a leader election can take place 
among "inactive" molecules of type L, and the remaining species of the leader-driven CRNs of [3] 
or [20] can be made to work only with the "active" species -Lactivci which will not appear (with high 
probability) until there is only a single copy of L to be activated. This shows how a timer CRN 
could alleviate the second problem mentioned above. In fact, even the first problem, the slowness 
of the naive leader election algorithm, is obviated in the presence of a timer CRN. If a timer CRN 
produces S after t{n) = Q(logn) time from a uniform initial configuration of n copies of X, then 
this can be used to construct a CRN that elects a leader in time 0{t{n)) [9]. In other words, the 
problem of constructing a timer CRN is "leader-election-hard" . 

Unfortunately, timer CRNs cannot be constructed under realistic conditions. The main theorem 
of this paper, stated informally, shows the following (the formal statement is Theorem 3.1 in 
Section 3): Let C be a CRN with volume and initial molecular count n that respects finite density (no 
sequence of reactions can produce more than 0{n) copies of any species) with an initial configuration 
that is dense (all species initially present have initial count J7(n)). Then with probability > 1 — 
2-r2(n)^ every producible species is produced in time 0{1). Since a uniform initial configuration is 
dense, timer CRNs that respect finite density cannot be constructed. It should be noted that the 
condition of respecting finite density is physically realistic; it is obeyed, for instance, by every CRN 
that obeys the law of conservation of mass. As noted previously, all population protocols respect 
finite density, which implies that no population protocol can function as a timer. 



This theorem has the following consequences for CRNs respecting finite density that start from 
a dense initial configuration with n molecules: 

1 . No such CRN can decide a predicate "monotonically" . By this we mean that if the CRN solves 
a decision problem (a yes/no question about an input encoded in its initial configuration) by 
producing species Y if the answer is yes and species N if the answer is no, then the CRN 
cannot be guaranteed to produce only the correct species. It necessarily must produce both 
but eventually dispose of the incorrect species. This implies that composing such a CRN 
with a downstream CRN that uses the answer necessarily requires the downstream CRN to 
account for the possibility that the upstream CRN will "change its mind" before converging 
on the correct answer. 

2. No CRN leader election algorithm can "avoid war": any CRN that elects a unique leader 
(in any amount of time) must necessarily have at least 2 (in fact, at least r2(n)) copies of 
the leader species for some time before the unique leader is elected. This implies that any 
downstream CRN requiring a leader must be designed to work in the face of multiple leaders 
being present simultaneously for some amount of time before a unique leader is finally elected. 

Another line of research that has illustrated a striking difference between "chemical computa- 
tion" and standard computational models has been the role of randomness. The model of stochastic 
chemical kinetics is one way to impose semantics on CRNs, but another reasonable alternative is 
to require that all possible sequences of reactions - and not just those likely to occur by the model 
of chemical kinetics - should allow the CRN to carry out a correct computation. Under this more 
restrictive deterministic model of CRN computation, Angluin, Aspnes, and Eisenstat [2] showed 
that precisely the semilinear predicates (/> : N'^ — )• {0, 1} (those definable in the first-order theory of 
Presburger arithmetic) are deterministically computed by CRNs. This was subsequently extended 
to function computation by Chen, Doty, and Soloveichik [7], who showed that precisely the func- 
tions / : N'^ — 7- N' whose graph is a semilinear set are deterministically computed by CRNs. Since 
semilinear sets are computationally very weak (one characterization is that they are finite unions 
of "periodic" sets, suitably generalizing the definition of periodic to multi-dimensional spaces), this 
shows that introducing randomization adds massive computational ability to CRNs. 

This strongly contrasts other computational models. Finite automata decide the regular lan- 
guages whether they are required to be deterministic or randomized (or even nondeterministic) . 
Turing machines decide the decidable languages whether they are required to be deterministic or 
randomized (or even nondeterministic). In the case of polynomial-time Turing machines, it is widely 
conjectured [18] that P = BPP, i.e., that randomization adds at most a polynomial speedup to any 
predicate computation. Since it is known that P C BPP C EXP, even the potential exponential 
gap between deterministic and randomized polynomial-time computation is dwarfed by the gap 
between deterministic CRN computation (semilinear sets, which are all decidable in linear time by 
a Turing machine) and randomized CRN computation (which can decide any decidable problem). 

Along this line of thinking ("What power does the stochastic CRN model have over the de- 
terministic model?"), our main theorem may also be considered a stochastic extension of work on 
deterministic CRNs by Condon, Hu, Mahuch, and Thachuk [10] and Condon, Kirkpatrick, and 
Maiiuch [11]. Both of those papers showed results of the following form: every CRN in a certain 
restricted class (the class being different in each paper, but in each case is a subset of the class of 
CRNs respecting finite density) with k species and initial configuration i G N has the property that 
every producible species is producible from initial configuration ni ■ i through at most t reactions, 



where m and t are bounded by a polynomial in k. In other words, if the goal of the CRN is to 
delay the production of some species until uj{poly{k)) reactions have occurred (such as the "Grey 
code counter" CRN of [10], which iterates through 2^^ states before producing the first copy of a 
certain species), then the CRN cannot be multi-copy tolerant, since m copies of the CRN running in 
parallel can "short-circuit" and produce every species in poly(A;) reactions, if the reactions are care- 
fully chosen. Our main theorem extends these deterministic impossibility results to the stochastic 
model, showing that not only is there a short sequence of reactions that will produce every species, 
but furthermore that the laws of chemical kinetics will force such a sequence to actually happen in 
constant time with high probability. 

The paper is organized as follows. Section 2 defines the model of stochastic chemical reaction 
networks. Section 3 states and proves the main theorem. Theorem 3.1, that every CRN respecting 
finite density, starting from a dense initial configuration, likely produces every producible species 
in constant time. Appendix A proves a Chernoff bound on continuous-time stochastic exponential 
decay processes. Lemma A.l, used in the proof of Theorem 3.1. Appendix B proves a Chernoff 
bound on continuous-time biased random walks, Lemma B.6, a messy-to-state consequence of which 
(Lemma B.7) is used in the proof of Theorem 3.1. Section 4 discusses questions for future work. 

2 Preliminaries 

2.1 Chemical reaction networks 

If A is a finite set of chemical species, we write N to denote the set of functions / : A — >■ 
N. Equivalently, we view an element c G N as a vector of |A| nonnegative integers, with each 
coordinate "labeled" by an element of A. Given 5 G A and c G N , we refer to c{S) as the count 
of S in c. Let ||c|| = ||c||i = '^gi^\ c{S) represent the total count of species in c. We write c < c' 
to denote that c{S) < c'{S) for all S* G A. Given c, c' G N , we define the vector component- wise 
operations of addition c -|- c', subtraction c — c', and scalar multiplication nc for n G N. If A C A, 
we view a vector c G N equivalently as a vector c G N by assuming c{S) = for all 5 G A \ A. 
For all c G N , let [c] = { 5" G A | c{S) > } be the set of species with positive counts in c. 

Given a finite set of chemical species A, a reaction over A is a triple /3 = (r, p, k) G N x N x M"'", 
specifying the stoichiometry of the reactants and products, respectively, and the rate constant k. 
For instance, given A = {A, B, C}, the reaction A+2B H A+'iC is the triple ((1, 2, 0), (1, 0, 3), 4.7). 
A (finite) chemical reaction network (CRN) is a pair C = (A, R), where A is a finite set of chemical 
species, and i? is a finite set of reactions over A. A configuration of a CRN C = (A, R) is a vector 
c G N . When the configuration c is clear from context, we write ij^X to denote c{X). 

Given a configuration c and reaction (3 = (r, p, k), we say that /3 is applicable to c if r < c (i.e., 
c contains enough of each of the reactants for the reaction to occur). If j3 is applicable to c, then 
write /3(c) to denote the configuration c -|- p — r (i.e., the configuration that results from applying 
reaction /3 to c). If c' = /3(c) for some reaction /3 G -R, we write c — >-c c', or merely c — )• c' when 
C is clear from context. An execution (a.k.a., execution sequence) <? is a finite or infinite sequence 
of one or more configurations £ = (cq, Ci, C2, . . .) such that, for all i G {1, . . . , |^|}, Cj_i -^ Cj. If a 
finite execution sequence starts with c and ends with c', we write c -^q c', or merely c — t-* c' when 
the CRN C is clear from context. In this case, we say that c' is reachable from c. 

We say that a reaction /3 = (r, p, k) produces 5 G A if p(5') — r(S') > 0, and that (3 consumes 
S if t{S) — p(5) > 0. When a CRN C = (A, R) and an initial configuration i G N are clear from 



context, we say a species 5" € A is producible if there exists c such that i — 7>^ c and c{S) > 0. 

2.2 Kinetic model 

The following model of stochastic chemical kinetics is widely used in quantitative biology and other 
fields dealing with chemical reactions in which stochastic effects are significant [14]. It ascribes 
probabilities to execution sequences, and also defines the time of reactions, allowing us to study 
the running time of CRNs in Section 3. 

A reaction is unimolecular if it has one reactant and bimolecular if it has two reactants. We as- 
sume no higher-order reactions occur. '^ The kinetic behavior of a CRN is described by a continuous- 
time Markov process, whose states are configurations of the CRN, as follows. Given a fixed volume 

k 

V > and current configuration c, the propensity of a unimolecular reaction /3 : X —)■... in con- 

k 

figuration c is p(c, /3) = kc(X). The propensity of a bimolecular reaction /3 : X + y — > . . ., where 
X ^ Y, is p{c, (3) = -c{X)c{Y). The propensity of a bimolecular reaction /3 : X + X -^ ... 
is p(c,/3) = ^— — '-^ — Zz^_4 rjij^g propensity function determines the evolution of the system as 
follows. The time until the next reaction occurs is an exponential random variable with rate 
p(c) = X^flgj:jp(c,/3). The probability that next reaction will be a particular /3next is ^ (cT " 

3 CRNs that produce all species in constant time 

We say a CRN respects finite density if there is a constant c such that, for any initial configuration 
i and any configuration c such that i — t-* c, ||c|| < c||i||. That is, the maximum molecular count 
attainable is bounded by a constant multiplicative factor of the initial counts, implying that if 
the volume is sufficiently large to contain the initial molecules, then within a constant factor, it is 
sufficiently large to contain all the molecules ever produced. We therefore safely assume that for 
any CRN respecting finite density with initial configuration i, the volume is ||i||. 

CRNs respecting finite density constitute a wide class of CRNs that includes, for instance, 
all mass-conserving CRNs (CRNs for which there exists a mass function m : A —?■ M'^ such that 

X^^Gr'^l'^) ~ ^Sep^^) ^°^ ^^^ reactions (r,p, /c)). 

In this section we prove that no CRN respecting finite density, starting from an initial configu- 
ration with "large" species counts, can delay the production of any producible species for more than 
a constant amount of time. All producible species are produced in time 0(1) with high probability. 

Let a > 0. We say that a configuration c is a-dense if for all 5 G A, 5 € [c] =^ c(S') > a||c||. 
Given initial configuration i G N^, let A? = { 5 G A | (3c) i -^* c and c(5) > } denote the set 
of species producible from i. For all t > and S* G A, let ^tS be the random variable representing 
the count of 5 after t seconds, if i is the initial configuration at time 0. 

The following is the main theorem of this paper. 

Theorem 3.1. Let a > and C = (A, i?) be a CRN respecting finite density. Then there are 
constants e,6,t > such that, for all sufficiently large n (how large depending only on a and C), 
for all a-dense initial configurations i with ||i|| = n, Pr[(VS' G A?) ^tS > 5n] > 1 — 2"*^". 



■^This assumption is not critical to the proof of the main theorem; the bounds derived on reaction rates hold 
asymptotically even higher-order reactions are permitted, although the constants would change. 

^Intuitively, p(c, /3) without the k and v terms counts the number of ways that reactants can collide in order to 
react (with unimolecular reactants "colliding" with only themselves). 



Proof. The handwaving intuition of the proof is as fohows. We first show that every species initially 
present, because they have count il.{n) and decay at an at most exponential rate, remain at count 
Q{n) (for some smaller constant fraction of n) for a constant amount of time. Because of this, during 
this entire time, the reactions for which they are reactants are running at rate i}(n). Therefore 
the products of those reactions are produced quickly enough to get to count Q{n) in time 0(1). 
Because those species (the products) decay at an at most exponential rate, they have count il.{n) 
for a constant amount of time. Therefore the reactions for which they are reactants are running at 
rate i}{n). Since there are a constant number of reactions, this will show that all producible species 
are produced in constant time with high probability. The tricky part is to show that although 
species may be consumed and go to lower counts, every species produced has count il.{n) for ^^(1) 
time, sufficient for it to execute il(n) reactions of which it is a reactant and produce 0,{n) of the 
products of that reaction. 

On to the gritty details. Because C respects finite density, the maximum count obtained by any 
species is 0{n). Let c be a constant such that all species have count at most en in any configuration 
reachable from i. Let K = X](rpfc)G_R^ ^^ ^^^ snvo. of the rate constants of every reaction. Let 
k = min(r,p,A:)eR ^ be the minimum rate constant of any reaction. For any A C A, define 

PROD (A) = { 5 G A I (3/3 = (r, p. A;) G i?) /? produces S and [r] C A } 



to be the set of species producible by a single reaction assuming that only species in A are present, 
and further assuming that those species have sufficient counts to execute the reaction. 

Define subsets Aq C Ai C . . . C A^-i C A^ ^ A as follows. Let Aq = [i]. For all i G Z"*", define 
Aj = Aj_i U PROD(Aj_i). Let m be the smallest integer such that PROD(Am) C A^. Therefore 
for all i G {1, . . . , m}, |Aj| > jAj_i|, whence m < [A[. 

By the hypothesis of the theorem, all S" G Aq satisfy \{S) > an. S may be produced and 
consumed. To establish that i^S remains high for a constant amount of time, in the worst case we 
assume that S is only consumed. Let A = cK. We will assume that A > 1 since we can choose 
c > i if it is not already greater than -i. This assumption is required in Lemma B.7, which is 

employed later in the proof. We also assume that c > 1, implying X > K. 

For all S" G A, the rate of any unimolecular reaction consuming S is at most Ki^S < X#S, 
and the rate of any bimolecular reaction consuming S is at most —{cn)^S = X^S, whence the 
expected time for any reaction consuming S is at least j^- We can thus upper bound the overall 
consumption of S by an exponential decay process with rate A, i.e., this process will consume copies 
of S at least as quickly as the actual CRN consumes copies of S. 

Let c = 4:6^^"^^^' . Let Sq = —. Fix a particular S G Aq. Let N = an and 6 = - in Lemma A.l. 
Then by the fact that the rate of decay of 5 is bounded by an exponential decay process with rate 
A and initial value an (the process Dv^ as defined in Section A) and Lemma A.l, 



Pr[(3t G [0, m + 1]) #tS < don] < Pr 



By the union bound. 



D^"(m + 1) < -an 



1 \ an/c— 1 



Pr[(35GAo)(3tG[0,m + l]) #^5 < M < |Ao|2" 



-<5on+l 



(3.1) 



That is, with high probability, all species in Aq have "high" count (at least S^n) for the entire 
first m + 1 seconds. Call this event H(Ao) (i.e., the complement of the event in (3.1)). 
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We complete the proof by a "probabilistic induction"^ on f € {0, 1, . . . ,m} as follows. Induc- 
tively assume that for all X G A, and all t £ [i,m + 1], t^jX > 5in for some 6i > 0. Call this event 
-ff(Aj). Then we will show that for all S G Aj+i and all i € [i + l,m + 1], assuming H{Ai) is true, 
with high probability ^jS > (5j+in for some (5j+i > 0. We will use Lemma B.7 to choose particular 
values for the Jj's, and these values will not depend on n. 

The base case is established by (3.1) for Sq = ^. Fix a particular species S S Aj+i. By the 
definition of Aj+i, it is produced by either at least one reaction of the form X ^ S + . . . for some 
X € Aj or by X + y — )■ 5 + . . . for some X,Y G A^. By the induction hypothesis H(Ai), for all 
t € [ijOT, + 1], #iX > 6in and #tY ^ ^in for some 6i > 0. In the case of a unimolecular reaction, 
the propensity of this reaction is at least kSiU. In the case of a bimolecular reaction, the propensity 
is at least |(<5in)2 = k6fn if X ^ F or AM^2^ = k^sfn - 1) if X = y. 

The last of these three possibilities is the worst case (i.e., the smallest). Because 6i is a constant 
independent of n, for sufficiently large n {n > 2/5?), dfn — 1 > 5fn/2, whence the calculated rate 
is at least i5fn. 

Let 5f = —^ and ^t = -^- Similar to the argument above concerning the maximum rate of 
decay of any S* G Aq, the rate at which reactions consuming S € Aj+i occur is at most \j^S. Also, 
by the above arguments, the minimum rate at which reactions producing S proceed is at least (5jn. 
Therefore we may lower bound the net production of S (i.e., its total count) by a continuous-time 
random walk on N that starts at 0, has constant forward rate bfU from every state z to i -|- 1, and 
has reverse rate \i from i to i — 1, defined as the process W^, -^ in Section B, Lemma B.7. By this 

lower bound and Lemma B.7^ (recall that we have assumed A > 1), 



Pr 



max j^~^S < Sr-n 

t6[«,i+l] 



< Pr 



max Wf ^{i) < SrU 

te[o,i] '' 



^ 2~&fn/22+l _ 2-fc<5j^n/88+l 



(3.2) 



Therefore with high probability ij^S reaches count at least SrU at some time i G [i,i + 1]. Let 
(5j+i = — . As before, we can bound the decay rate of S from time i until time m + 1 by an 
exponential decay process with rate A and initial value J^n, proceeding for m + 1 seconds (since 
i > 0). By Lemma A.l, 



Pr[(3i G [t, m + 1]) #tS < J.+m] < Pr 



\6rn 



A 



D^'-"(m + l) < -5rn 



1 \ A(5rn/c— 1 



2" <5i+in+l 

(3.3) 



Fix a particular S G Aj+i. By the union bound applied to (3.2) and (3.3) and the fact that 
t < i -|- 1, the probability that ^^5 < (5j+in at any time t € [i -|- 1, tti + 1], given that the induction 
hypothesis H{Ai) holds (i.e., given that #iX > 6in for all X € Aj and all t € [i,m + 1]), is at 
most 2~''^i"-/^^~^^ + 2~<^'+i"'+^. Define i/(Aj+i) to be the event that this does not happen for any 
S € Aj+i, i.e., that (yS € Aj_|_i)(Vt € [i,m + 1]) #tS > 6i+in. By the union bound over all 
S £ Aj+i, 

Pr [^H{Ai+i)\H{Ai)] < \Ai+i\ /'2-H'"/88+i ^ ^s^+.n+A (3^4) 



''In other words, we show that if the induction hypothesis fails with low probability p, and if the induction step 
fails with low probability q, given that the induction hypothesis holds, then by the union bound, the induction step 
fails with probability at most p + q. 

Although Lemma B.7 is stated for times between and 1, here we use it for times between i and i + 1 (just shift 
all times down by i). 



By the union bound applied to (3.1) and (3.4), and the fact that 6m < Si for alH G {0, . . . , m—1}, 
the probabihty that any step of the induction fails is at most 



Pr[^H{Ao)] + Y, PrhH{A,)\H{Ai^i)] < |Ao|2-^""+^ + ^ \Ai\ r2-'^'5.-i«/88+i + 2-^'"+i 

< lAl (^2"'^'""+^ + 2-^^™"/88+i') < jAI /'2-^^m'"/88+2 






(3.5) 

At this point we are essentially done. The remainder of the proof justifies that the various 
constants involved can be combined into a single constant e that does not depend on n (although 
it depends on C and a) as in the statement of the theorem. 

By our choice of (^i+i, we have 



Sr 6f 6fk I 6ik 

Oi+l — = TT- = TTT^ > 



c 4Ac 16Ac \16Ac/ 
Recall that (5o = f . Therefore, for ah i G {1, . . . ,m}, Jj > (j^^) ■ So 

ak \ I ak 



<5^ > ( T77T72 1 >\tFT7i] ■ (3-6) 



/ ak 



16Ac2 / - \ 16Ac 

2|A|-i 



Therefore, if n > I T§r-^ I , then by (3.5) and (3.6), the failure probability in (3.5) is at most 



<y.k ^ 



= |A|2"'(-K"--')^) "'''^' = |A|2-K^^^iJWt))'""/««+^ < |A|2"<^i^rf^)'""/«^^l 

Letting e' = -^ ( -"^^..^, J implies failure probability at most |A|2~'^'"+^. For n > 

^+I}2s\M^ |A|2-^''^+2 < 2-(^'/2)n_ Letting t = m + l, 5 = 6m, and e = e'/2 completes the proof. D 

We have chosen t = m + 1 (i.e., t depends on C); however, the same proof technique works if 
we choose t = 1 (or any other constant independent of C), although this results in smaller values 
for S and e. 

Although the choice of e is very small, the analysis used many very loose bounds for the sake of 
simplifying the argument. A more careful analysis would show that a much larger value of e could 
be chosen, but for our purposes it suffices that e depends on C and a but not on the volume n. 

However, it does seem fundamental to the analysis that e < 7^ for some < 7 < 1. Consider 
the CRN with n initial copies of Xi in volume n and reactions 

Xi ^ 0, X2 ^ 0, ... Xm ^ 0, 

-'^l + -^1 — > X2, X2 + X2 — > X3, ... Xm + Xm — > Xm+l 

9 



This is a CRN in which the number of stages m is actually equal to its worst-case value |A| — 1, 
and in which each species is being produced at the minimum rate possible - by a bimolecular 
reaction ~ and consumed at the fastest rate possible - by a unimolecular reaction (in addition to a 
bimolecular reaction) . Therefore extremely large values of n are required for the "constant fraction 
of n" counts in the proof to be large enough to work, i.e., for the production reactions to outrun 
the consumption reactions. This appears to be confirmed in stochastic simulations as well. 

4 Conclusion 

The reason we restrict attention to CRNs respecting finite density is that the proof of Theorem 3.1 
relies on bounding the rate of consumption of any species by an exponential decay process, i.e., no 
species S is consumed faster than rate X#S, where A > is a constant. This is not true if the CRN 
is allowed to violate finite density, because a reaction consuming S such as X + S —^ X proceeds 
at rate i^X^S/n in volume n, and if ^X is an arbitrarily large function of the volume, then for 
any constant A, this rate will eventually exceed A#5. 

In fact, for the proof to work, the CRN need not respect finite density for all time, but only 
for some constant time after t = 0. This requirement is fulfilled even if the CRN has non-mass- 
conserving reactions such as X — > 2X. Although such a CRN eventually violates finite density, for 
all constant times t > 0, there is a constant c > such that i^tX < c^^qX with high probability. 

The assumption that the CRN respects finite density is a perfectly realistic constraint satisfied 
by all real chemicals,^ and as noted, many non-mass-conserving CRNs respect finite density for a 
constant amount of initial time. Nevertheless, it is interesting to note that there are syntactically 
correct CRNs that violate even this seemingly mild constraint. Consider the reaction 2X — )■ 3X. 
This reaction has the property that with high probability, i^X goes to infinity in constant time. 
Therefore it is conceivable that such a reaction could be used to consume a species Y via X+Y — > X 
at such a fast rate that, although species S is producible via Y —^ S, with high probability S is 
never produced, because copies of Y are consumed quickly by X before they can undergo the 
unimolecular reaction that produces S. 

We have not explored this issue further since such CRNs are not physically implementable by 
chemicals. However, it would be interesting to know whether such a CRN violating finite density 
could be used to construct a counterexample to Theorem 3.1. 

An open problem is to precisely characterize the class of CRNs obeying Theorem 3.1; as noted, 
those respecting finite density with dense initial configurations are a strict subset of this class. 

Acknowledgements. The author is very grateful to Chris Thachuk, Anne Condon, Damien 
Woods, Manoj Gopalkrishnan, David Soloveichik, David Anderson, and Lea Popovic for many 
insightful discussions. 



^The reader may notice that recent work has proposed concrete chemical implementations of arbitrary CRNs using 
DNA strand displacement [5,21]. However, these implementations assume a large supply of "fuel" DNA complexes 
supplying mass and energy for reactions such as X — >■ 2X that violate the laws of conservation of mass and energy. 
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Technical appendix 

A ChernofF bound for exponential decay 

In this section, we prove a Chernoff bound on the probabihty that an exponential decay process 
reduces to a constant fraction of its initial value after some amount of time. 

For all A^ G Z"*" and A > 0, let Dy^ (t) be a Markov process on {0,1,..., A^} governed by 
exponential decay, with initial value A^ and decay constant A; i.e., D^ (t) = N for all t € [0,Ti), 
Df (t) = AT - 1 for alH e [Ti, Ti + T2), D^(i) = A^ - 2 for alH G [Ti + T2, Ti + T2 + T3), etc., 
where for i € {1, . . . , A^}, Tj is an exponential random variable with rate A(A^ — i + 1). 

Lemma A.l. Let N G Z+, X,t>0, and < 6 < I. Then 

Pr[Df (i) < 5N] < {26e^^Y^-\ 

Proof. Let e be the largest number such that e < 6 and eA^ G N. Then eA^ < 6N < eN + 1. 

Define T = X^j^x*^ Tj as the random variable representing the time required for N — eN decay 
events to happen, i.e., for D^ to decay to strictly less than 6N from its initial value A^. 

If X is an exponential random variable with rate A', then the moment-generating function 
Mx : M ^ M of X is M:s.{0) = E[e^^] = ^, defined whenever |(9| < A' [8]. Therefore 



M' 



T^\ 



A(A^-i + l) N-i + l 



x{N-i + i)-e iv_i + i_f 



A 



defined whenever \0\ < X(N — i + l). In particular, the smallest such value is A(A^ — (A^ — eA^) + l) 
A(eA^ + 1), so we must choose |^| < A(eA^ + 1). 

Because each Tj is independent, the moment-generating function of T is 



Mr,si0) = E[e^T^] = E[e^E"-^"T, 



'N~eN 

n 

. i=l 



e^T, 



N-eN N-eN at • -, ^ 

- n Ei^-1 - n ^^S^ - n -^,- 

j=l i=l ^^ '^-^ A i=eN+l'' A 

For any t > 0, the event that T < t (it takes fewer than t seconds for A^ — eA^ decay events 
to happen) is equivalent to the event that Di^(t) < 5N (D^, which started at D^(0) = A^, has 
experienced at least A^ — eA'^ decay events after t seconds to arrive at Dj^(i) < 6N). 
Using Markov's inequality, for all < 0, 

pr OT^] 1 ^ 

Pr[B^it)<6N] = Pr[T'<t] = Pr[e'^'>e'']<%^ = -^ J] "^ 

^ ^ i=eN+l'^ A 

Let 6 = —XeN. This implies \9\ = XeN < X{eN + 1) as required for MTi{0) to be defined for 
allie {l,...,N -eN}. 
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Then 

N 

Pr[D^(t) < 6N] < e^^^* JJ 



i + eN 

i=€N + l 

XeNt I eTV \ /" eN + 1 \ f N -1 \ f N 



eN + eN + lJ \eN + eN + 2j \N + eN - 1 J \N + eN 
;,,^i i€N){€N + l)...{eN + eN- 2)(e7V + eN - 1) 



{N + 1){N + 2)...{N + eN-l){N + eN) 



cancel terms 



which completes the proof. D 

B ChernofF bounds for biased random walks 

This section proves ChernofF bounds for biased random walks that are used in Section 3. Lemma B.6 
shows that a random walk on Z with forward rate / and reverse rate f has a high probability to 
take at least J^((/ — r)t) net forward steps after t seconds. Lemma B.7 uses Lemma B.6 to show 
that a random walk on N (i.e., with a reflecting barrier at state 0) with reverse rate in state i 
proportional to i, has a high probability to reach state j in time t, where j is sufficiently small 
based on the backward rate and t. 

We require the following Chernoff bound on Poisson distributions, due to Franceschetti, Dousse, 
Tse, and Thiran [13]. 

Theorem B.l ( [13]). Let P(A) be a Poisson random variable with rate A. Then for all n £N 

Pr [P(A) > n] < e" 

if n > \ and 

Pr[P(A) <n] <e-^(—\ 

if n < X. 

The following corollaries are used in the proof of Lemma B.6. 

Corollary B.2. Xei < 7 < 1. Then Pr[P(A) < 7A] < e"^ (^V = ( ^^ 



-A |£^ 

n 



7A 



Corollary B.3. Lei 7 > 1. Then Pr[P(A) > 7A] < e"^ f ^ 



7A 



7A 



We require the following bound on the natural logarithm function. 

Lemma B.4. Let S > 0. Then 

ln(l + 5)<^+ ^ 



2 2(1 + 5)' 
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1-8" 




(a) 



Lemma B.4: f^'*'^ i dx < area(i?) - area(T). (b) Lemma B.5: f^_g ^ dx > area(_R) + area(T). 

Figure 1: Illustration of Lemmas B.4 and B.5 (not to scale). 



Proof. See Figure la for an illustration of the geometric intuition. Recall that for a > 1, Ina = 
/" - dx. Since ^ is convex, the area defined by this integral is at most the area of R, the rectangle 
of width 5 and height 1, minus the area of T, the right triangle of width 5 and height 1 — j—^. 
Therefore 



ln{l + 6) 



1+5 



1 



— dx < 5 6 

X 2 



1 



1 



l + S 



5_ ( 

2^2(T 



6) 



D 



A often-useful upper bound is ln(l + 5) < 6, i.e., using the area of R as an upper bound. 
Interestingly, shaving 6 down to ^ + 2(1+5) ^^ subtracting area(T) is crucial for proving Lemma B.6; 
using the weaker bound ln(l + 6) < 5 in the proof of Lemma B.6 gives only the trivial upper bound 



52 



of 1 for the probability in that proof. The same holds true for the next lemma (i.e., the term -^ is 
crucial), which is employed similarly in the proof of Lemma B.6. 

Lemma B.5. Let < S < 1. Then 



ln{l-6)<-5--. 

Proof. See Figure lb for an illustration of the geometric intuition. Recall that for < a < 1, 
— Ina = J" - dx. The area defined by this integral is at least the area of R, the rectangle of width 



5 and height 1, plus the area of T, the right triangle of width 6 and height S. This is because 
^- = —1 at the value x = 1 and ^^ < ~1 for all < x < 1, so the hypotenuse of T touches the 
curve at X = 1 and lies strictly underneath the curve for all 1 — 5 < x < 1. Therefore 



/•I 1 
ln{l-6) = - dx>6 + 

Jl-5 X 
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B.l Chernoff bound for biased random walk on Z 

Let /, f > 0. Let U ? -(t) be a continuous-time Markov process with state set Z in which U ? ^,(0) = 0, 
with transitions from state i to i+ 1 with rate / for all i G Z, and with transitions from state i + 1 
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to i with rate r for all i G Z. In other words, U; -(t) is a continuous time biased random walk on 
Z, with forward bias / and reverse bias r. 

Lemma B.6. For all f > f > and all t,e > 0, 



Pr 



U^-,(t)<(l-6)(/-r)i 



< 2e 8/ 



Proof. Since the forward and reverse rates of U ? . are constants independent of the state, the total 
number of forward transitions F and the total number of reverse transitions R in the time interval 
[0,t] are independent random variables, such that F — R = U? -(t). F is a Poisson distribution 

with rate f = ft (hence E[F] = /) and R is a Poisson distribution with rate r = ft. 

Let e = i/2. Let d = f - r > 0. Let 7 = ^ = (i-^)/+^^ Let A = /. Let 5 = n^^{~jl , and 
note that 1 + 5 = q_ /. , ^ . By Corollary B.2, 



Pr[F < 7AJ < 



(l-e)/+er 



^ 7 y I (l-e)/ + er 

= ((1 + 5)exp (1 - (1 + <5)))(^-')^+'" = exp (ln(l + 5)- <5))(i-^)/+^'^ 

< exp — I — — — 6 ] by Lemma B.4 

^V2 2(1 + 5) J ^ 

= 6xp — — — = exp 



2(1 + 5) J V2(l + <5) 

(l-e)f+erj _ l^(l-e)/+er^ 

''''^ ' 2 f 1 + </--) ^ " ''''^ 9 Al-^)/+^r-+.(/-r) ' 
^l^+ (l-£)/+£r;y \-l, {l-e)f+er 

exp I ^^'^'^^) - exp ( -«/-^)) ] 

" (T^fc. j -^^P^2/((l-.)/ + e.)J 

^exp(d.(4^). 
Let 7' = 1+^ = ^^+(^-^)^' and A' = r. Let 6 = -^H^, and note that 1-6= '. , . By 
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Corollary B.3, 




e/ + (l-e)r 
((1 - <5)exp (1 - (1 - 6))y^+^^-'^' = exp (ln(l -6) + 6)'^+^^-'^' 

^2 \ f/+(l-e)r 



< exp [ —6 — TT ~^ ^ I by Lemma B.5 

{<f-r)r Y^^^'-'^^ _ [ -(6(/-r)) 



exp -— — — -J = exp 



^2 



2(e/ + (l-e)r)V " V2(e/ + (1 - e)r) 

< exp ( -^<f-')^" \ - exp ( -^<f-')n (B 2) 

Observe that 7A - 7'A' = f -r- 2e(/ - r) = (1 - 2e)(/ - r). By (B.l), (B.2), and the union 
bound, Pr[F - R < (1 - 2e)(/ - r)] < 2 • exp (-^^^j^) • Substituting the definitions / = ft, 
r = rt, and e = e/2 completes the proof. D 

B.2 Chernoff bound for random walk on N with state-dependent reverse bias 

Let N € Z+, let Sf,Xr > 0, and let W^^ a.(^) ^^^ t > be a continuous-time Markov process with 
state set N in which W^ ^ (0) = 0, with transitions from state i to state i + 1 with rate 6fN for 
all i € N, and with transitions from state i + 1 to i with rate Xri for all i € N. In other words, 
W|^ ^ is a continuous time random walk on N with a reflecting barrier at 0, in which the rate of 
going forward is a constant SfN, and the rate of going in reverse from state i is proportional to i 
(with constant of proportionality A^). 

Lemma B.7. Let A^ > 1, let 5f,5r > such that 5r < j^- Then for all N G Z'^ such that 

N > 6/{6f), 



Pr 



ma.xWZ^^{t)<5rN 
ie[o,i] ^ 



^ 2-Sf N/22+1 



Proof. Consider the event that (Vi € [0,1]) W|^ ^ (i) < 6rN. Then in this case, the maximum 
reverse transition rate is at most XrSrN < 5fN /A. 

For t € [0, 1], consider the random walk '^StN.SfN/iif) defined as in Lemma B.6 as a Markov 
process on Z (i.e., the states are allowed to go negative) in which the forward rate from any state 
i G Z is 5fN as in W^ a.(^)' ^^^ ^^^ reverse rate from any state i € Z is SjN/A, which is an upper 
bound on the reverse rate of W^^ a.(^) ^c"^ ^^Y state i G {1, . . . , 6rN}. 

Therefore Pr[(V£ G [0,1]) W|^^^^(t) < 5rN] < Pr[(Vt G [0,1]) Vs^.N,5fN/4{i) < SrN], since 
^5fN,5fN/4: has a strictly higher reverse rate and does not have a reflecting barrier at state i = 0, 
hence is strictly less likely never to reach the state 6rN at any time t G [0, 1]. 

We prove the theorem by bounding Pr[\JsrSfN/4i^) < 6rN]. 
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Lemma B.6, with e = g, t = 1, / = 5fN and f = SjN/A implies that 



Pr 



'^5fN,5fN/4,{^) < —7- 



Pr 



U5,jv,5,7V/4(l) < (1 - ^) (^/^ - ^/^/4) 



[%Y{5jN-6jN/i)'^ 
8SfN 



< 2e ^'f" by Lemma B.6 

Note that e"^/'^^ = 2-"/(32in2) < 2-"/22 f^^ ^^^ ^ > q^ Therefore, 

6fN- 



Pr 



U5^.Ar,5^Ar/4(l) < 
Since 6f > AXrSr and A,. > 1, 6fN/A > Xr5rN > 5rN, so 



^ 2 . 2-'5/^/22 _ 2-'5/A^/22 



+1 



Pr 



U5^Ar,5jAr/4(l) < f^r-A^ 



^ 2-^/^/22+1 



By our observation that Pr[(V£ € [0, 1]) Wf. ;^^(i) < 6rN] < Pr[(Vt e [0, 1]) \J5j,N,SfN/4{i) < SrN], 
and since U5 jv,(5fAr/4(l) ^ i^r-^ is a counterexample to the latter event with t = 1, 



completing the proof. 



Pr 



te[o,i] ' 



^ 2-'5/^/22+i 
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